CHAPTER 8 Getting Your Data into the Computer 107
unknown, refused, or not applicable). The goal is to make sure that for every cate-
gorical variable, a numerical code is entered and the cell is not left blank.
Never try to cram multiple choices into one column! For example, don’t enter 1, 2
into a cell in the CaregiverType column to indicate the patient has a nurse and phy-
sician. If you do, you have to painstakingly split your single multi-valued column
into separate two-state flag columns (described earlier) before you analyze the
data. Why not do it right the first time?
Recording numerical data
For numerical data (meaning interval and ratio data), the main issue is how much
precision to record. Recording a numeric value to as many decimals as you have
available is usually best. For example, if a scale can measure body weight to the
nearest tenth of a kilogram, record it in the database to that degree of precision.
You can always round off to the nearest kilogram later if you want, but you can
never “unround” a number to recover digits you didn’t record. So it’s best to
record values in your data from measurement instruments to the degree of preci-
sion provided.
Along the same lines, don’t group numerical data into intervals when recording it.
If you know the age to the nearest year, don’t record Age in 10-year intervals (such
as 20 to 29, 30 to 39, 40 to 49, and so on). You can always have the computer do
that kind of grouping later, but you can never recover the age in years if all you
record is the decade.
Some statistical programs let you store numbers in different formats. The pro-
gram may refer to these different storage modes using arcane terms for short, long,
or very long integers (whole numbers) or single-precision (short) or double-precision
(long) floating point (fractional) numbers. Each type has its own limits, which may
vary from one program to another or from one kind of computer to another. For
example, a short integer may be able to represent only whole numbers within the
range from 32 768
,
to
32.767, whereas a double-precision floating-point number
could easily handle a number like 1 23456789012345 10250
.
. Excel has no trouble
storing numerical data in any of these formats, so to make these choices, it is best
to study the statistical program you will use to analyze the data. That way, you can
make rules for storing the data in Excel that make it easy for you to analyze the
data once it is imported into the statistical program.
Following are issues to consider with respect to numerical variables in Excel:»
» Don’t put two numbers (such as a blood pressure reading of 135 / 85 mmHg)
into one column of data. Excel won’t complain about it, but it will treat it as